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S ■ A SIMPLE SMOOTH BACKFITTING METHOD FOR 

Q ■ ADDITIVE MODELS 

o ■ 

! By Enno Mammen-*^ and Byeong U. Park^ 

■ University of Mannheim and Seoul National University 

' In this paper a new smooth backfitting estimate is proposed for 

. additive regression models. The estimate has the simple structure of 

Nadaraya- Watson smooth backfitting but at the same time achieves 
the oracle property of local linear smooth backfitting. Each compo- 
te ■ nent is estimated with the same asymptotic accuracy as if the other 
^0 ' components were known. 

■3 

■ 1. Introduction. In additive models it is assumed that the influence of 
different covariates enters separately into the regression model and that the 
regression function can be modeled as the sum of the single influences. This 
is often a plausible assumption. It circumvents fitting of high-dimensional 

I curves and for this reason it avoids the so-called curse of dimensionality. On 

■ the other hand, it is a very flexible model that also allows good approxima- 
\^ , tions for more complex structures. Furthermore, the low-dimensional curves 
CN I fitted in the additive model can be easily visualized in plots. This allows a 

' good data-analytic interpretation of the qualitative influence of single co- 

variates. 

In this paper we propose a new backfitting estimate for additive regression 
models. The estimate is a modification of the smooth backfitting estimate 
of Mammen, Linton and Nielsen [9]. Their versions of smooth backfitting 
have been introduced for Nadaraya- Watson smoothing and for local linear 
smoothing. Smooth backfitting based on Nadaraya-Watson smoothing has 
the advantage of being easily implemented and of having rather simple in- 
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^ • tuitive interpretations. On the other hand, local linear smooth backfitting 
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leads to more complicated technical implementations. The backfitting for- 
mula has no easy interpretation. But, the local linear smooth backfitting 
estimate has very nice asymptotic properties. It achieves the asymptotic 
oracle bounds. The local linear smooth backfitting estimate of an additive 
component has the same asymptotic bias and variance as a theoretical local 
linear estimate that uses knowledge of the other components. In this paper 
we introduce a smooth backfitting estimate that has the simple structure 
of a Nadaraya-Watson estimate but at the same time has the asymptotic 
oracle property of local linear smoothing. 

Several approaches have been proposed for fitting additive models: the 
classical backfitting procedure by Buja, Hastie and Tibshirani [1], the marginal 
integration method of Linton and Nielsen [8] and Tj0stheim and Auestad 
[16], the smooth backfitting estimate of Mammen, Linton and Nielsen [9], the 
local quasi-differencing approach of Christopeit and Hoderlein [2], and the 
two-step procedures of Horowitz, Klemela and Mammen [5]. All estimates 
require several estimation steps. 

The marginal integration estimate makes use of a full-dimensional non- 
parametric regression estimate as a pilot estimate. Each component of the 
additive model is fitted by marginal integration of the full-dimensional fit, 
that is, by integrating out all other arguments. Versions of marginal inte- 
gration have been proposed that achieve oracle bounds [4]. The algorithm is 
unstable for moderate and large numbers of additive components and cal- 
culation of the full-dimensional regression estimate causes problems. On the 
other hand, backfitting avoids fitting a full-dimensional regression estimate. 
It is based on an iterative algorithm. In each step only one additive compo- 
nent is updated. All other components are fixed. So, only one-dimensional 
smoothing is applied. Asymptotic theory for the classical backfitting is com- 
plicated by the fact that the estimate is defined as a limit of the iterative 
backfitting algorithm but no explicit definition is given. Asymptotic theory 
is available under restrictive conditions on the design densities [13, 14]. In 
general, the classical backfitting estimates do not achieve the oracle bounds. 
For practical implementations of the backfitting estimates, see [15]. 

Smooth backfitting estimates are defined as the minimizers of a smoothed 
least squares criterion. As backfitting estimates they can be calculated by 
an iterative backfitting algorithm. Asymptotic analysis becomes simpler be- 
cause of the explicit definition of the estimate. Furthermore, making use of 
an approach in [10], the estimate can be interpreted as an orthogonal pro- 
jection of the data vector onto the space of additive functions. As with the 
classical backfitting estimates, smooth backfitting does not make use of a 
full-dimensional estimate and for this reason it avoids the curse of dimen- 
sionality. Smooth backfitting also achieves the oracle bounds. This has been 
shown for smooth backfitting estimates based on local linear fitting (see [9] ) . 
For practical implementations of smooth backfitting, see [12] and [11]. Some 
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two-step procedures have been proposed for additive models. Christopeit 
and Hoderlein [2] use local quasi-differencing in the second step, an idea 
coming from efficient estimation in semiparametric estimation. Horowitz, 
Klemela and Mammen [5] and Horowitz and Mammen [6] develop a general 
approach that allows oracle efficient estimates for a broad class of smoothing 
methods. For a related approach, see also [7]. 

In the original version of local linear smooth backfitting, both the esti- 
mated value and the estimated slope of an additive component are updated. 
This is done by application of a two-dimensional integral operator. This 
definition leads to lengthy formulas, which makes it hard to implement the 
method. Furthermore, the understanding of the method and of its asymp- 
totic properties is complicated by the two-dimensional nature of the inte- 
gral operator. On the other hand, smooth backfitting for Nadaraya- Watson 
smoothing is rather simple to understand and it can be rather easily im- 
plemented. Again, an integral operator is used in the backfitting steps. But 
now the operator can be easily interpreted as an empirical analogue of a 
conditional expectation. In this paper we propose a smooth backfitting es- 
timate that inherits the advantages of Nadaraya-Watson and local linear 
smooth backfitting. As with Nadaraya-Watson smoothing, it is based on 
one-dimensional updating. This essentially simplifies the interpretation and 
asymptotic analysis of the algorithm. On the other hand, the new estimate 
achieves the asymptotic oracle bounds of local linear smooth backfitting. Our 
numerical study confirms this asymptotic equivalence, and suggests that the 
new estimate has a slightly better performance. 

The paper is organized as follows. In the next section the method is intro- 
duced and is shown to be asymptotically equivalent to local linear smooth 
backfitting under some conditions on the kernel functions of the backfitting 
operator. Section 3 discusses some numerical properties of the new proposal. 
The assumptions for our theoretical results and proofs are deferred to Sec- 
tion 4. 

2. Local linear smooth backfitting. In this section we introduce our new 
smooth backfitting method for local linear smoothing. It is based on a mod- 
ification of smooth backfitting for Nadaraya-Watson smoothing. We briefiy 
recall the definition of Nadaraya-Watson backfitting from Mammen, Lin- 
ton and Nielsen [9]. We consider an additive model. For i = 1, . . . ,n, it is 
assumed for one-dimensional response variables Y^, . . . , that 

(2.1) = mo + mi{X{) + ■■■ + ^^(X^) + e\ 

Here, e* are error variables, mi, . . . ,mrf are unknown functions from M to M 
satisfying Emj{Xj) = 0, mo is an unknown constant and = {XI, . . . ,X^) 
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are random design points in Mf^. Throughout the paper we make the assump- 
tion that are i.i.d. and that Xj takes its values in a bounded in- 
terval Ij. Furthermore, the error variables e^, . . . ,e" are assumed to be i.i.d. 
mean zero and to be independent of X^, . . . ,X^. This excludes interesting 
autoregression models, but it simplifies our asymptotic analysis. We expect 
that our results can be extended to dependent observations under mixing 
conditions. 

The Nadaraya- Watson smooth backfitting estimate is defined as the min- 
imizer of the smoothed sum of squares 

(2.2, 1 1 \y' - ^„ - \(^L_^, . . . . ^) 



where k is a d-variate kernel function and / = /i x • • • x 7^. The minimization 
is done under the constraints 



(2.3) / mj{xj)pj{xj)dxj = 0, j = l,...,d, 
Jij 

where pj is a marginal kernel density estimate. The minimizer fhj of (2.2) 
is uniquely defined by the equations (see [9]) 

(2.4) mj{xj)=mj{xj) I fhk{xk)Tijk{xj,Xk)dxk, j = l,...,d, 

where fhj is a normalized marginal Nadaraya- Watson estimate and ifjk is a 
kernel density estimate of the conditional density Pjk/Pj- Here pjk denotes 
the marginal density of {Xj,Xk)- 

In this paper we propose to use other choices of fhj and njk, and define 
a new estimate by (2.4) with these new choices. Let fhj be the marginal 
local linear estimate. Together with the slope estimate fh'j the local linear 
estimate is defined as the minimizer of 

n 

(2.5) - rhjix,) - m*{xj){X] - x,)fKj,^ (x„Xj), 

i=l 

where K^. is a boundary corrected univariate kernel function. It is defined 
as 

-1 



Khj{uj,Vj) = [a{uj,hj)vj +b{uj,hj)]hj K 



hj 



where K \s a, symmetric convolution kernel (i.e., a probability density func- 
tion) supported on [—1, 1]. The functions a and b are chosen so that 



(2.6) / Kh^{uj,Vj)dvj = 1, 

(2.7) / {vj - Uj)Khj i'^j^Vj) dvj = 0. 
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We also write Kh.{vj — uj) for the kernel hJ^K[{vj — Uj)/hj]. This kernel 
should not be confused with Kh.{uj,Vj). Specifically, 



Kh,{uj,Vj) = 



(2.8) 



Ij{Uj,hj, + ) 



t^K{t) dt 



where 

IJ'K,j,i{uj) = / {vj - Ujfh'^Kh^ {vj - Uj) dvj 

for Ij{uj,hj,+) = {t : Uj + hjt £ Ij}. 

The normalized marginal estimate fhj is defined as 

(2.9) fhj{xj) = ■mj{xj) — Jpj{u)du J rhj{u)pj{u) du 

for a modified density estimate pj. The modified kernel density estimate pj 
is defined as 



Pj{Xj) =Pj{Xj) 



P*{Xjf 



where pj is the usual kernel density estimate, 



1 " 

" i=i 
1 " 

p;*{x,) = -Y,K,^{x„Xj){X}-x,f. 



For the definition of TCjk, we consider the two-dimensional kernel density 
estimate 

Pjk[Xj,Xk) — Pjk[Xj,Xk) Cfl*(r,. \ ' 



where 



1 

Pjk{xj,Xk) = - Y,Khj{xj,X'j)Lh^{xk,Xl), 
^ 1=1 

1 " 

Pjkixj,Xk) = -^Khj{xj,Xj)Lh^{xk,Xl){Xj -Xj). 
^ i=i 



6 



E. MAMMEN AND B. U. PARK 



The kernel Lh,. is defined as 



Lh^^{uk,Vk) = [c{vk,hk)uk + d{vk,hk)]hi^ L 
where c and d are chosen so that 



hk 



(2.10) 
(2.11) 

Specifically, 

Lh,,iuk,Vk) 
where 

tJ-L,k,lM 

for Ik{vk,hk,- 
L: 



Lh^{uk,Vk)duk = 1, 



{vk - Uk)Lhi^{uk,Vk)duk = 0. 



f^L,k,2ivk) - ihk\vk - Uk))fil^k,lM 



/^L,fc,o(^fc)^L,fc,2(^fc) - ^^L,k,l^Vk)^ 

{vk - Ukfhl^Lh^{vk - Uk) duk -- 



Vk - Uk 

hk 



t^L{t) dt 



= {t:vk — hkt & Ik}- We use the following convolution kernel 

L{u)=2Ky^iu)-K^iu). 

This kernel satisfies J L{u) du = 1, J uL{u) du = and / u'^ L{u) du = — J u'^ x 
K{u)du. Other kernels with these moments will also work. Again, we also 
write Lh.{vj —uj) for the kernel h~^L[{vj —Uj)/hj]. Note that the definition 
of Lfi^ differs from that of . . The difference comes from integration with 
respect to different arguments in the moment equations. Note also that the 
moment condition (2.10) is required on their kernels K^f. (as well as ) 
for the local linear smooth backfitting estimate proposed by Mammen, Lin- 
ton and Nielsen [9]. The additional condition (2.11) on the first moment 
is needed here to mimic local linear smooth backfitting with a Nadaraya- 
Watson-type estimate. 
We now define njk as 



(2.12) 



T^jkixj 1 Xk') 



Pjk{xj,Xk) Jpjk{u,Xk)du 



Pj{Xj) 



Jpj{u) du 



Our main result states that the estimate rhj is asymptotically equivalent 
to local linear smooth backfitting estimates. We will give motivation for the 
choice of TTjk at the end of this section. 
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Theorem 2.1. Under Assumptions (A1)-(A5) stated in Section 4, we 
get the following expansions for the estimate fhj defined by (2.4) with ifij at 
(2.5)-(2.9) andTTjk at (2.12); 



fhj {xj ) = ruj {xj ) + h'j 



(2.13) 



■:jCK,j,2{xj)m'-{xj) + Aj 



+ 



- E (x, , xj) - ^ Kh^ (x,-, Xj)e' + Op(n- 

i=l J i=l 



-2/5^ 



uniformly for Xj £lj, where Cxj/ixj) = Iiji'^j ~ ^jY^j^ Kh.{xj,Vj) dvj , 
1 



A,- 



':Ck 



ruj {uj )p"j {uj ) duj 



nijiuj) ^^ , , duj + / m''(uj)pj(uj) duj 
Pj{Uj) Ji^ ^ ' ' ' 

and Ck = / u^K{u) du. 

We point out that Cxj/ixj) in the theorem is different from fixj/i^j) 
defined earlier. In fact, for satisfying (2.6) and (2.7) it follows that 



lJ'KJ,2{xj)^lK,j,l{Xj) - flK,j,l{Xj)lJ^K,j,e+l{Xj) 



^ lJ'K,jfl{Xj)lJiK,j,2{Xj) - IJLK,j,l{XjY 

We compare the estimate rhj with the local linear smooth backfitting es- 
timate, m-j^sB) studied by Mammen, Linton and Nielsen [9]. There are dif- 
ferences at the boundary and in the interior of Ij. For Xj in the interior 
/~ = {u &Ij:u + hj G Ij, u — hj G Ij} one gets CK,j,2{xj) = Ck- Thus the 
expansion of fhj becomes 



(2.14) 



fhj{xj) = mj{xj) + -CKhjmj{xj) + hjAj 



+ 



2 = 1 



-Y.Kh,{xj,X'j)e' + Op\ 



n 



-2/5^ 



1=1 



For Xj S I- this expansion differs from the stochastic expansion of fhj o,^ 
only by the constant term hjAj; see [9] and [11]. This additive term comes 
from the norming Jj, fhj{uj)pj{uj) duj = 0. This can be easily verified by 
observing that 

nij {uj )pj {uj ) duj 



\cKh] 



mj{uj)pj{uj)duj —2 I mj{uj) 



PjiUj) 



duj 



+ Op(n-2/5). 
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One could use other normings for estimation of mj. We briefly discuss two 
other normings, 



(2.15) 



(2.16) 



mj{uj)pj{uj) duj, 



1 " 

i=l 



For these two modified estimates the foUowing expansions hold. 

Corollary 2.1. Under the assumptions of Theorem 2.1, the expansion 
(2.13) applies for the estimates fhj^^{xj) and mj^^^{xj) defined at (2.15) 
and (2.16), respectively, with Aj replaced by 



A, 



for fhj^^ and by 



-\Ck 



A 



■h++ 



ruj {uj )pj {uj ) duj + / m'j {uj )pj {uj ) duj 



■\Ck 



m'j (uj )pj (uj) duj 



for rhj^ 



'],++■ 



For the local linear smooth backfitting estimate in j^sB, we get the ex- 
pansion (2.14) with Aj^sB = for Xj in the interior I~; see [9] and [11]. 
There a different norming was used for a combination of the smooth back- 
fitting estimate of mj and its derivative; see (3.4) in [11]. The norming of 



rrii 



is chosen so that the mean integrated squared error Jf .[rhj^. 



m{xj)]'^p{xj) dxj is asymptoticahy minimized. Note that 

[fhj^^^^Xj) — mj{xj)]p{xj) dxj = / mj^^^{xj)p{xj) dxj = op{n~'^^^). 



Furthermore, our estimate rhj differs from the local linear smooth back- 
fitting estimate ?Tij,SB on the boundary Ij\IJ ■ The estimates have slightly 
different asymptotic biases on the boundary. The difference is due to the fact 
that they use different boundary corrected kernels. Recall that the local lin- 
ear estimate in the univariate nonparametric regression with a conventional 
kernel K, without boundary modification, has the asymptotic bias 

1 /// ^ (x) ^ - f^K,l{x)flK,3 {x) j^2. 

2 fJ-K,o{^)fJ'K,2{x) - fiK,l{xy 

see [3]. Here m is the nonparametric regression function, h is the bandwidth, 
fJ-K/ix) = Jj{u — xYh~^Kh{u — x) du for ^ > and / is the support of the 
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covariate. A similar bias expansion holds for the local linear smooth backfit- 
ting estimate fhj^sBixj). Recall that in the construction of rhj^sB, boundary 
corrected kernels K^^ that satisfy Jj^ K^^{x}.,Vk) dxk = 1 for all k (including 
j) are used. Note that this moment condition is different from (2.6) but is 
the same as (2.10) that we require on L. By an extension of the arguments 
given in [9] and [11], one gets for the bias of rhj^sB{xj) the expansion 



2 CK',,d^j)CK',,,2{x,)-CK*^j,l{x,Y 



where CK*,j,e is defined in the same way as Ckj/ but with Kh being 
replaced by K^,. The bias expansion of our estimate is simplified since 
Ckja = 1 and Ckj,! = from (2.6) and (2.7), respectively. 

The asymptotic variances of our estimate ifij and the local linear smooth 
backfitting estimate rhj^sB are also slightly different on the boundary. They 
are identical in the interior of Ij, however. The difference on the boundary 
is also due to the use of different kernels as is discussed above. 

We now give motivation for our choice of vfjfc when d = 2. We give some 
heuristic arguments why our proposal is a second-order modification of local 
linear smooth backfitting. We restrict the discussion to points in the interior 
of Ij and for simplicity we neglect boundary effects. For this reason in the 
heuristics we use convolution kernels that are not corrected at the boundary. 
The local linear smooth backfitting estimate of Mammen, Linton and Nielsen 
[9] is defined as the minimizer of 



n „ 

E/r 



mo - rhi{xi) - fn[{xi){X\ - xi) - m2{x2) 



(2.17) 

- m*2{x2){Xl - X2)fKh^{X\ - xi)Kh^{Xl - X2) dxidx2. 

Here mi and fh2 are the estimates of the additive components mi and m2, 
respectively, and fh\ and fn^ are the estimates of the slopes of mi and m2- 
Minimization of (2.17) with respect to mi{xi) and m\{xi) for fixed xi and 
for fixed functions m2('))"^2(') leads to 



n 

= E / - rfiQ - mi{xi) 



(2.18) - fhl{xi){X\ - xi) - m2{x2) - fhl{x2){Xl - X2)] 

1 



X\-xi 



KhAX\-xi)Kh,{Xl-X2)dx 



2- 



This equation is used in the smooth backfitting algorithm for updating m-i 
and m\ . We modify this equation so that the slope estimates m| and m2 do 
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not enter the updating equation and thus the algorithm only keeps track of 
the values of fhi and fh2- 

We first discuss how frig can be dropped. The basic idea is to replace 
equation (2.18) by 



(2.19) 



n „ 
i=l 



m{{xi){X{ - xi) - m2{x2)] 



1 



Kh-,{X{-xi)Lh^{X\-X2)dx2. 



X\-xi, 

Here, L^^ is a kernel such that the right-hand sides of (2.18) and (2.19) 
are asymptotically equivalent. This can be accomplished by choosing so 
that 

'fh2 (X2) + fh*2 (2^2) (^2 - X2)]Kh2 {X\ -X2)dx2^ J m2 {x2)Lh2 {X2 - X2) dx2 . 

This is done if we choose L to satisfy J L{u)du = 1, / uL{u)du = and 
/ v?'L{u) du = — J v?'K{u) du since 

m2(x2) ^ fh2{Xi) - m'2{Xl){Xi - X2) + \fh'i{Xi){Xi - X2)\ 

ml{x2) ^ m'^iXi) - m'i{Xi){Xl - X2). 

It remains to modify (2.19) further so that m| does not appear. This can 
be easily achieved by solving (2.19) with respect to rhi. It gives 



fhi{xi) 



i=l 



Y,{X{-xifKh,{X\-xi)Y,Z'Kh,{Xi-xi) 

1 i=l 

n n 

Y^{X\ - x^)KhAX{ - xi) E ^'(^i - xi)Kh,{X\ - XI ) 

i=l 1=1 
' n n 

Y,{X\-x^fKHAX\-x^)Y,KhAXl-x^) 



i=l 



i=l 



Y,{Xi-xi)Kh,iXi-x,) 



.1=1 



mo 



with = y* — / fh2{x2)Lfi2iX2 — X2) dx2- This is equivalent to 

^ ( \ - ( \ f - ( .P12{X1,X2) 

mi(xi) = mi(xi) - mo - / m2[X2) — ^7—^; — 0x2, 



which implies 



mi(xi) =mi{xi] 



Pi{xi) 

m2{x2)Tfl2{xi,X2) dx2. 
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The above argument is valid for xj G Ij . For the boundary area Ij \ Ij , 
it continues to hold if one uses the boundary corrected kernel Lh2{x2, X^) 
instead of L/i2(^2 ~ ^2) and K^iixi, XI) instead of — xi). 

3. Numerical properties. In this section we compare some numerical 
properties of the new and the local linear smooth backfitting estimates. For 
this, we drew 500 datasets (X*, 1"*), i = 1, . . . , n, with n = 100 and 400 from 
the model 

(Ml) r = miiXi) + m2(X^) + m3(X|) + e\ 

where mi(xi) = xf , ^2(2:2) = x^, m^^xs) = —x^ and are distributed as 
A^(0, 0.01). The covariates were generated from truncated three-dimensional 
normal distributions with marginals iV(0.5,0.5) and correlations pi2 = pis = 
P23 = p, where pij denotes the correlation between Xi and Xj. The trun- 
cation was done for the covariates to have the compact support [0,1]'^. To 
be specific, a random variate generated from one of the three-dimensional 
normal distributions was discarded if one of the covariates fell outside the 
interval [0,1]. The correlation levels used were p = and 0.5. The ker- 
nel that we used for the backfitting algorithm was the biweight kernel 
K{u) = (15/16)(1 — n^)^/[„i 1] (u). For the local linear smooth backfitting 
estimate, we used Khj that satisfy J Khj{u,v) du = 1 for all j, but neither 
(2.6) nor (2.7). For a fair comparison, we used for the new estimate the con- 
ventional kernels K^.^v — u) instead of Kh.{u,v) given in (2.8). Also, both 
the new and the local linear smooth backfitting estimates were recentered 
according to the formula (2.16). 

Figures 1 and 2 and Table 1 summarize the results. The target functions 
are rrij — Emj{Xj) rather than rrij since Emj{Xj) 7^ 0. Figures 1 and 2 
depict the bias, the variance and the mean squared error curves of the new 
and the local linear smooth backfitting estimates, which are based on 500 
pseudosamples of size 400. The results for the samples of size 100 are not 
presented here, but they give the same message as those for the samples of 
size 400. Table 1 shows the integrated squared biases, integrated variances 
and integrated mean squared errors. It is observed from Figures 1 and 2 
that the bias property of the new estimate rhj is nearly the same as that 
of the local linear smooth backfitting estimate 'fhj^sB in the interior and on 
the boundary. In the interior the variance properties of the two estimates 
are also nearly the same, while on the boundary the new estimate is seen to 
be slightly more stable. Because of less variability on the boundary, the new 
estimate has a slightly improved mean integrated squared error property, as 
shown in Table 1. 

The bandwidths hj used for these results were chosen as 

r r i"^/^ 

(3.1) hj=n^^/'' E{e'f J K'^{t)dt C]^ J m'^{ujfpj{uj) duj 
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This is the optimal bandwidth for local linear smoothing in univariate non- 
parametric regression models (i.e., additive models with one additive com- 
ponent) and also the optimal bandwidth for the local linear smooth backfit- 
ting estimate mj,sB; see [11] for the latter. In additive models the optimal 
bandwidth depends on the norming of the estimate. In particular, for the 
MISE-optimal norming we get the estimate mj^_|__|_(2;j) (see the discussion af- 
ter Theorem 2.1) and an asymptotically optimal bandwidth that is defined 
as in (3.1) but with mj(nj) replaced by m"(uj) — Jj, m'j {vj)p{vj) dvj . We 

used the bandwidth as defined in (3.1). In this respect we follow the usual 
practice in classical nonparametric regression and do not minimize MISE by 



rho=0 ml rn2 m3 




0.0 0.2 0.4 0.6 O.B 1.0 0.0 0.2 0.4 0,6 0.8 1.0 0.0 O.S 0.4 0.6 O.a 1.0 




0.0 0.2 0.4 0.6 O.B 1.0 0.0 0,2 0.4 0.6 O.S 1.0 0.0 0.2 0,4 O.S 0.6 1,0 




0.0 0-2 0,4 0.6 O.S 1.0 0-0 Q2 0-4 0.& 0,S 1.0 CO O-Z 04 0-6 O-B 1,0 

Fig. 1. Bias, variance and mean squared error curves when p — 0. The solid curves 
correspond to the new estimates fhj , and the dashed curves are for the local linear smooth 
backfittmg estimates fhj^sB- The three rows show the bias, the variance and the mean 
squared error curves. In each row, the leftmost panel corresponds to mi and the next two 
to the right are for m,2 and ma. These are based on 500 pseudosamples of size 400. 



rho=0.5 
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0.0 0.2 0.4 o.s o.a T.o a.o 0.2 0.4 0.6 o.a 1.0 g.o q.2 0.4 a.e o.a 1.0 






O.C 0.£ 0.4 0.6 Q.a t.o O.Q 0.2 0.4 O.S 0.3 1.Q O.D 0.2 0.4 0.6 0.9 \X> 






0,0 0.2 0,4 0.6 0-8 1.0 



D,0 0,2 0,4 0,6 0.8 1.0 



0,0 0.2 0.4 0.6 O.S 1.0 



Fig. 2. Bias, variance and mean squared error curves when p = 0.5. Line types and 
arrangement of panels are the same as in Figure 1. These are also based on 500 pseu- 
dosamples of size 400. 



using estimates of / mj(u)pj{u) du that have parametric rate n~^/^. Note 
that in univariate nonparametric regression an estimate m{x) could be im- 
proved by the modification fh*{x) = m{x) — ^Y^i'=i'rh{X'^) + n~^J27=i^^- 
For example, if m(x) is the local linear smoother, then the asymptotic bias 
of fh*{x) equals |Ci<-[m"(x) — / m" {u)p{u) du]h? , leading to a smaller first- 
order integrated squared bias. We tried other fixed bandwidths around the 
optimal bandwidth (3.1), but the lessons were essentially the same. 

4. Assumptions and proofs. 



4.1. Assumptions. Below, we collect the assumptions used in this paper. 
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Table 1 

Integrated squared bias, integrated variance and integrated mean squared error, multiplied 
by 10^ of the new and the local linear smooth backfitting estimates based on 500 

pseudosamples of size 400 



Corr. 


Target 


Estimate 


Integrated 


Integrated 


Integrated 


level 


function 




sq. bias 


variance 


MSE 


p = 


mi 


mi, SB 


0.0073 


0.3479 


0.3552 






mi 


0.0071 


0.3234 


0.3305 




m2 


m2,SB 


0.0070 


0.4027 


0.4097 






m2 


0.0081 


0.3768 


0.3849 




ms 


m3,SB 


0.0136 


0.4040 


0.4176 






rhz 


0.0138 


0.3660 


0.3798 


p = 0.5 


mi 


mi, SB 


0.0114 


0.3910 


0.4024 






mi 


0.0114 


0.3657 


0.3771 




m2 


m2,SB 


0.0179 


0.3928 


0.4107 






m2 


0.0179 


0.3629 


0.3808 




ms 


m3,SB 


0.0334 


0.3967 


0.4301 








0.0326 


0.3601 


0.3927 



(Al) 
(A2) 



(A3) 
(A4) 



(A5) 



The kernel K is bounded and symmetric about zero. It has compact 

support ([—1,1], say) and is Lipschitz continuous. 

The d-dimensional vector X* has compact support / = Ji x • • • x for 

bounded intervals Ij and its density is bounded from below and from 

above on /. 

E{e^f < +00. 

The functions m'j, pj, D'^.pji^{xj,Xk) for 1 < j, k < d exist and are 
continuous, where D^^ denotes the partial derivative operator with 
respect to Xj and -D^,^. is the operator of order 2. 
The bandwidths /ii, . . . , /i^ are of order n~^/^. 



4.2. Proof of Theorem 2.1. Define rjl- = mkiXl) - E[mk{Xl)\Xi] and 



~ A / \ 



rhj {xj) = mo + mj{xj) + }^ / ' ' 



mk{xk)dxk 



(4.1) 



1 



+ -CK,j,2{xj)h' 



m';{x,) + Y.Dlf 
Mi 



Pjk {xj , Xfc) 



mk{xk)dxk 
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For the local linear estimate ttij, the following expansion holds: 

(4.2) mj{xj) = rhf{xj) + rhf{xj) + Op(n~2/5) 

uniformly for xj G Ij . These expansions follow by standard asymptotic smooth- 
ing theory. Define now 

mf{xj) = mj{xj) + \CK,j,2{xj)h?jm"j{xj), 

rnf{xj) = rhf{xj). 

We will show that for S = A,B 



(4.3) mf (xj) = m^ixj) - ^ / mf (x^) 



-S f \Pjk{Xj,Xk) 



Pj[Xjj 



dxk + Op{n ^/^) 



uniformly for Xj G Ij. Below we argue that (4.3) implies the statement of 
Theorem 2.1. The proof of (4.3) will be given afterwards. 

We apply Theorems 2 and 3 in [9]. We will do this with our pjk, pj, rhj, 
fhj, respectively, taking the roles of their pj^, pj, ifij, fhj. It is easy to verify 
the conditions of these theorems. From their Theorem 2 with Sj = Ij and 



A. 



n 



-2/5 



together with our (4.3) we get 



(4.4) mf{xj)=mf{xj) 



Pj (u) du 



mf{u)pj{u)du + Op{n ^/^) 



uniformly for Xj G Ij. Here for S = A,B the random function fhj is defined 

by 



fhj {xj ) = fhj {xj ) 



Pj (u) du 



-1 



ihj {u)pj (u) du 



fhj{u)pj (u) du = 0. 

It is easy to check that the second term on the right-hand side of (4.4) is of 
order Op(n~^/^). Therefore we have 



(4.5) 

Note that 
(4.6) 



fhf {xj ) = fhf {xj ) + Op (n ) . 



fhj{xj) = fhf{xj) + fhf (xj). 
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We now apply Theorem 3 in [9] with anj{xj) = rn^{xj), (3{x) = 0, Jln,o = 0) 
ttn.o = 0, Sj = Ij and A„ = This gives 

(4.7) (xj) =m^ (xj) — Jpj{u)du J {u)pj{u) du + Op{n~'^^^) 

uniformly for Xj G Ij. Note that up to terms of order Op{n~'^/^) the second 
term on the right-hand side of (4.7) is asymptotically equal to a deterministic 
sequence. In the statement of Theorem 3 this sequence was called 7nj- The 
statement of Theorem 2.1 easily follows from (4.5)-(4.7). 

We remark that in Assumption (A2) in [9] it was assumed that Pjk{xj,Xk) = 
PkjixkjXj) (in the notation of the current paper). Our choice of pjk does not 
satisfy this symmetry constraint. It can be checked that Theorems 2 and 3 in 
[9] continue to hold when this symmetry constraint is dropped. Let us also 
mention that in their Assumption (A9) of Theorem 3 / an,j{u)pj{u) du = 
7n,j — Op{An) should be replaced by the correct assumption [/ Pj{u) du\~^ x 

/ an,j{u)pj{u) du = -fnj + Op{An). 

It remains to show (4.3). 

Proof of (4.3) for S = B. We first note that the following expansions 
hold: 

Pjk{Xj,Xk) _ Pjk{Xj,Xk) ^ ^ 1^ ^^2 Pjk{Xj,Xk) ^, ^2 
~ H-^) ^ p,(x,)3 ^^(^^-^ 

- CK,j,2{xj)hj[D^^Pjk{Xj,Xk)]—, — :-^+Op{n~ ' ), 

Pj [Xj ) 

uniformly for Xj E Ij and Xk £ IJ^ , and 

(4 9) Pjkixj,xk) ^ Pjk{xj,Xk) ^ n („-2/5)^ 

Pj{Xj) Pj{Xj) 

uniformly for Xj € Ij and Xk € Ik \ Ik ■ These claims immediately follow from 

(4.10) pTjixj) = CK,j,2{xj)h]p'j{xj) + Op(n-2/5), 

(4.11) p;*{xj) = CK,j,2{xj)h]pj{x,) + Op(n-2/5), 

(4-12) p'j^{xj,Xk) = CK,j,2{xj)h^jD^.pjk{xj,Xk) + Op{n~'^/^), 

uniformly for Xj G Ij and Xk G and pj{xj), pj*{xj), p*j^{xj,Xk) are all 

Op(n~^/^) uniformly for Xj G Ij and Xk ^ Ik\Ik • 
Now, it follows that uniformly for Xj G Ij 

mk{xk)pjk{xj,Xk)dxk 
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1 ■ f 

n J Ik 



Tl . ^ 

1=1 

2 



1 



-Y,Kh^{x,,X])ml{Xl 
PjkiX'j,Xk) 



n . , 
1=1 



+ Op(n-2/5) 



n 



i=l 



mk{xk)dxk 



1 " 

+ -T.^h,{xj,X'j)rilj 



n f- , 



^C/</ifc /" pjk{xj,Xk)m'l{xk) dxk + Op{n ^Z^) 



(4.13) =Pj(2;i) / ^^''^^/'^y^ mk{xk)dxk 



Pj{3 

+ CKJ,2{Xj)hpj{Xj 



Pjk{Xj,Xk) 



mk{xk)dxk 



P'jjXj) 
Pj{Xj) 



Pjkixj ) Xk) 



mk{xk)dxk 



2 "^74 Pjixj) 

If 1 " 

-Cxhl / pjk{xj,Xk)ml{xk)dxk + -^Kh.{xj,Xj)rjlj 



+ Op(n-2/5). 



Furthermore, 



(4.14) 



-CKhlm'kixk) 



Pjk{xj,Xk) 
Pj{Xj) 



dxk 



P2h(^l2^mUxk)dxk + o,{n-'/'). 
h Pj [Xj ) 



Using (4.8), (4.13) and (4.14) gives 

■dxk 



B/ \Pik[Xj,Xk) 

^kK^k)— 



(4.15) 



Pjixj) 

Pjk{xj,Xk) , . I ^Kh^{xj,X]) 

/ mk{xk) dxk + -2^ — 
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j-~.2 Pjk{Xj,Xk) 



Pj{Xj) 



Plugging (4.15) and (4.1) into the right-hand side of (4.3) gives uniformly 
for Xj G Ij , 

^fi^j)-y] [ ^fc (^fc) ^^-^^'^^'* dxk 

= mj{xj) + ^CK,j,2{xj)hjmj (xj) + Op{n~'^^^) 

= mf{xj)+Op{n~'^/^). 
This shows (4.3) for S = B. 

Proof of (4.3) for S = A. We have to show for j and Xj G Ij, 

rhi{xkf-^^^^dxk = Op{n-'">). 

Pj{Xj) 

For this claim, it suffices to show that for j and xj € Ij, 

.PM^^^'^>^\dxk = Op{n-^/'), 



(4.16) 
(4.17) 
(4.18) 



rhkixk)- 



Pj{Xj 



rhkixk) ' \ 'Axjfdxk = Op(l), 



rhk{xk)[DxjPjk{xj,Xk)] ^\ dxk = Op(l). 

Pj [Xj ) 



For the proof of (4.16)-(4.18), note that the left-hand sides of these equations 
can be written as X^iLi ^i(^j)^i where the weights Wi{xj) depend on 
n,X^, . . . ,X"',Xj, but not on ei,...,e„. By standard smoothing theory it 
can be shown that in all three cases 



sup \wi{xj)\ = Op{l), 

l<i<n,Xj^Ij 

These bounds imply 



sup 



K(x,)| = Op(i). 



(4.19) 



sup 



-^Wi{xj)et 
n ^ 



i=l 



We give a short outline of the proof for (4.19). 

Choose C > and consider the event E that |t(;j(a;j)| < C and |tt;^(a;j)| < C 



for 1 <i <n and Xj € Ij . We define 



Wi{Xj) 



Wi[Xj} 

1, 



on E, 
elsewhere. 
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Furthermore, for 5 > small enough define 



Note that 



P 



max eJ > n 

l<i<n 



1/2+S 



< n~^^Eel = o(l) 



and that 



Ii?£il(|ei| <nV2+5)| = \Eeili\ei\>n'/^+')\ 
Therefore on E we get 



n 



n ■ 



Thus, it remains to show that 



(4.20) 



sup 



1 



n 



For the proof of (4.20) we argue that 

1 



(4.21) 



sup 



n 



1=1 



and that for each A > there exist constants C", C" > such that 



(4.22) sup P 

^1^1 j L'" i=l 

We prove (4.22). On the event E, 



n ~ 



<C'exp(-C"n^/io-^). 



P 



1 " 



n 



i=l 



< exp(-n^/^~'^An~^/^)£;exp 

< exp(-A7i^/io^^) 

n 

X ^[1 + ^(n-i-2^I(Ji(xj)^e- exp(2n-i/2-<5|yj.(^^.)|„i/2+<5))] 
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<exp(-A?ii/io-^)[][l + C2exp(2C7)?i-i-25^^2] 

i=l 

= O(l)exp(-Ani/i0-'^). 
This shows (4.22) and completes the proof of Theorem 2.1. 
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